Semantic Associative Topic Models for Information Retrieval
ثبت نشده
چکیده
主題模型(topic model)被廣泛地應用在各種文件建 模以及語音識別、資訊檢索和本文探勘系統中,有 效地擷取文件或字詞的語意和統計資料。大多數主 題模式,例如機率潛在語意分析(probabilistic latent semantic analysis) 和 潛 在 狄 利 克 里 分 配 (latent Dirichlet allocation),主要都透過一組潛藏的主題機 率分布來描述文件與字詞之間的關係,並用以擷取 文件的潛在語意資訊。然而,傳統的主題模型受限 於詞袋(bag-of-words)的假設,其潛藏主題僅能用來 擷取個體詞(individual word)之間的語意資訊。雖然 個體詞可傳達主題信息,但有時會缺乏本文準確的 語意知識,容易造成文件的誤判,降低檢索的品 質。為了改善主題模型的缺點,本論文提出一種新 穎的語意關聯主題模型(semantic associative topic models),考慮多元字詞(multi-words)之間的語意關 聯資訊,基於關聯式探勘(association mining)法擷取 出多元字詞之間的相互關聯資訊,並透過線性模型 結合的方式,有效地改善傳統的機率潛在語意分析 模型。我們以華爾街日報和美聯社新聞文件集進行 實驗評估。實驗結果顯示新方法相較於傳統主題模 型具有較優的文件模組化,並在文件檢索的效率上 亦有良好的改善。 關鍵詞:主題模型、機率潛在語意分析、關聯探勘 法、語言模型、資訊檢索。
منابع مشابه
A Joint Semantic Vector Representation Model for Text Clustering and Classification
Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...
متن کاملGenetic Optimization for Associative Semantic Ranking Models of Satellite Images by Land Cover
Associative methods for content-based image ranking by semantics are attractive due to the similarity of generated models to human models of understanding. Although they tend to return results that are better understood by image analysts, the induction of these models is difficult to build due to factors that affect training complexity, such as coexistence of visual patterns in same images, ove...
متن کاملDeveloping a BIM-based Spatial Ontology for Semantic Querying of 3D Property Information
With the growing dominance of complex and multi-level urban structures, current cadastral systems, which are often developed based on 2D representations, are not capable of providing unambiguous spatial information about urban properties. Therefore, the concept of 3D cadastre is proposed to support 3D digital representation of land and properties and facilitate the communication of legal owners...
متن کاملImproving Search on the Semantic Desktop using Associative Retrieval Techniques
While it is agreed that semantic enrichment of resources would lead to better search results, at present the low coverage of resources on the web with semantic information presents a major hurdle in realizing the vision of search on the Semantic Web. To address this problem we investigate how to improve retrieval performance in a setting where resources are sparsely annotated with semantic info...
متن کاملAre Semantically Coherent Topic Models Useful for Ad Hoc Information Retrieval?
The current topic modeling approaches for Information Retrieval do not allow to explicitly model query-oriented latent topics. More, the semantic coherence of the topics has never been considered in this field. We propose a model-based feedback approach that learns Latent Dirichlet Allocation topic models on the top-ranked pseudo-relevant feedback, and we measure the semantic coherence of those...
متن کاملComparison of Topic Language Models for Query Disambiguation in Information Retrieval
A long-standing challenge in information retrieval is to disambiguate query words for more precise search results. However, two or more meanings of a word in a query, or polysemy, deteriorate the precision effectiveness of information retrieval systems. There is a need for correct and effective information retrieval in many information systems such as health care and customer relationship manag...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010